{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Remote Validation Inference\n",
    "\n",
    "## The problem\n",
    "\n",
    "As a concept, [guardrailing](https://www.guardrailsai.com/docs/concepts/guard) has a few areas that, when unoptimized, can introduce latency and be extremely resource-expensive. The main two areas are: \n",
    "\n",
    "* Guardrailing orchestration; and\n",
    "* ML models that validate a single guard\n",
    "\n",
    "These are resource-heavy in slightly different ways. ML models can run with low latency on GPU-equipped machines. (Some ML models used for validation run in tens of seconds on CPUs, while they run in milliseconds on GPUs.) Meanwhile, guardrailing orchestration benefits from general memory and compute resources. \n",
    "\n",
    "## The Guardrails approach\n",
    "\n",
    "The Guardrails library tackles this problem by providing an interface that allows users to separate the execution of orchestration from the execution of ML-based validation.\n",
    "\n",
    "The layout of this solution is a simple upgrade to validator libraries themselves. Instead of *always* downloading and installing ML models, you can configure them to call a remote endpoint. This remote endpoint hosts the ML model behind an API that presents a unified interface for all validator models. \n",
    "\n",
    "Guardrails hosts some of these for free as a preview feature. Users can host their own models by following the same interface.\n",
    "\n",
    "\n",
    ":::note\n",
    "Remote validation inferencing is only available in Guardrails versions 0.5.0 and above.\n",
    ":::\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using Guardrails inferencing endpoints\n",
    "\n",
    "To use a guardrails endpoint, find a validator that has implemented support. Validators with a Guardrails-hosted endpoint are labeled as such on the [Validator Hub](https://hub.guardrailsai.com). One example is [Toxic Language](https://hub.guardrailsai.com/validator/guardrails/toxic_language).\n",
    "\n",
    "\n",
    ":::note\n",
    "To use remote inferencing endpoints, you need a Guardrails API key. You can get one by signing up at [the Guardrails Hub](https://hub.guardrailsai.com). \n",
    "\n",
    "Then, run `guardrails configure`.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "guardrails hub install hub://guardrails/toxic_language --quiet;\n",
    "\n",
    "# This will not download local models if you opted into remote inferencing during guardrails configure\n",
    "# If you did not opt in, you can explicitly opt in for just this validator by passing the --no-install-local-models flag\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From here, you can use the validator as you would normally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from guardrails import Guard\n",
    "from guardrails.hub import ToxicLanguage\n",
    "\n",
    "guard = Guard().use(ToxicLanguage())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The benefit of hosting a validator inference endpoint is the increase in speed and throughput compared to running locally. This implementation makes use cases such as [streaming](https://www.guardrailsai.com/docs/concepts/streaming) much more viable in production.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import display, clear_output\n",
    "\n",
    "fragment_generator = guard(\n",
    "    model=\"gpt-3.5-turbo\",\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Tell me about the Apple Iphone.\"},\n",
    "    ],\n",
    "    max_tokens=1024,\n",
    "    temperature=0,\n",
    "    stream=True,\n",
    ")\n",
    "\n",
    "\n",
    "accumulated_output = \"\"\n",
    "for op in fragment_generator:\n",
    "    clear_output()\n",
    "    accumulated_output += op.validated_output\n",
    "    display(accumulated_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Toggling remote inferencing\n",
    "\n",
    "To enable/disable remote inferencing, you can run the CLI command `guardrails configure` or modify your `~/.guardrailsrc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "# To disable\n",
    "guardrails configure --disable-remote-inferencing\n",
    "\n",
    "# To enable\n",
    "guardrails configure --enable-remote-inferencing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To disable remote inferencing from a specific validator, add a `use_local` kwarg to the validator's initializer. \n",
    "\n",
    ":::note\n",
    "When running locally, you may need to reinstall the validator with the `--install-local-models` flag.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from guardrails import Guard, install\n",
    "\n",
    "try:\n",
    "    from guardrails.hub import ToxicLanguage\n",
    "except ImportError:\n",
    "    install(\"hub://guardrails/toxic_language\", install_local_models=True)\n",
    "    from guardrails.hub import ToxicLanguage\n",
    "\n",
    "# uses validator locally.\n",
    "guard.use(ToxicLanguage(use_local=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hosting your own endpoint\n",
    "\n",
    "Validators can point to any endpoint that implements the interface that Guardrails validators expect. This interface can be found in the `_inference_remote` method of the validator.\n",
    "\n",
    "After implementing this interface, you can host your own endpoint (for example, [using gunicorn and Flask](https://flask.palletsprojects.com/en/stable/deploying/gunicorn/)) and point your validator to it by setting the `validation_endpoint` constructor argument.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guard = Guard().use(\n",
    "    ToxicLanguage(\n",
    "        use_local=False,\n",
    "        validation_endpoint=\"your_endpoint_ip_address\",\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::note\n",
    "\n",
    "Contact us to host validators in your own VPC with managed hardware.\n",
    "\n",
    ":::\n",
    "\n",
    "# Learn more\n",
    "\n",
    "To learn more about hosting your own validators, check out the [Host Remote Validator Models doc](/docs/how_to_guides/hosting_validator_models).\n",
    "\n",
    "To learn more about writing your own validators, check out the [Custom validators doc](/docs/how_to_guides/custom_validator)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
